Package Imports

The Business Problem

FalseSouth is in the transportation industry, and we help drivers book and find loads with brokers as well as take care of drivers’ back office operations. FalseSouth usually gets paid directly by brokers for the service of hauling loads. However, FalseSouth is having trouble assessing if our collections process has been successful and how much money we are receiving from the brokers. Therefore, the FalseSouth team has assigned to you the task of digging into the accounts receivable data to help with the collections.

Please address the following questions and deliverables:

  1. Please identify and list out any potential data errors and/or issues.
  2. How much has FalseSouth invoiced to all brokers?
  3. How much has FalseSouth collected?
  4. What balance remains to be collected?
    • How much of that is overdue?
    • Please create a chart by month showing invoiced amount split by paid vs. not paid
    • Please provide some descriptive statistics with regards to invoicing data provided.
      • In your summary statistics, please answer the following questions:
        • What is the median invoice amount?
        • What is the mean/median/mode number of invoices per broker?
  5. To support our collections effort, please create a separate statement for each of the brokers of all their invoices and balances. A sample template for the statement has been provided in the attached Excel. (Hint: Please create a script that generates the statements as opposed to manually creating them. We are looking to assess your technical skills in the completion of the assignment deliverables.)

Ingest and Clean Data

General Note: Depending on the size and end state of a project, I would make the decision on whether to leave any utility functions (like the one below) "in-line" or move out to a local module. To demonstrate the workflow, and hopefully make the final document more interpretable, I am leaving them in place where utilized.

Data Quality Exploration

General Note: At this point, I generally do a couple of basic checks. I want to make sure my data types are what I expect, I want to understand null/missing values, and I want to get a general feel for the numbers we are dealing with (basic stats). Oftentimes, these checks will raise other questions, and additional steps are taken as needed.

Data Quality Note: 30 Null Values in the LOAD_NUM column. For this exercise, I don't think this will effect the deliverables as we appear to have a good INVOICE_ID and no duplicates there. In a real world scenario, the load number can be quite important and will likely be requested by the broker if an overdue invoice is brought to their attention without it for reference. I would also be very curious if the the missing data were truly in sequential order as it is here too...I would want to know if it was a momentary lapse in data collection, or ensure that there is not a systemic problem.

Data Quality Note: When checking descriptive stats, I want to get a feel for the spread, but also potential upper and lower outliers.

General Note: Outside of the checks above, the COMPANY_NAME field jumped out at me immediately when reading in our data. Having dealt with human input and naming inconsistencies in just about every system I have ever worked in, seeing the capitalization differences, as well as the names that are VERY similar immediately raise flags for me.

Data Quality Note: After exploring the COMPANY_NAME occurrances, I am fairly confident there are some duplicate accounts (some examples below.) In a real world scenario, this would lead me back to the data source to search for an attribute that might help clarify...my first target would be an Account Number or Address to help differentiate. After that, maybe searching for a "Parent" or "Payer" level account number to see if a business hierarchy exists, as this is often the case within large organizations. It could be that each are actually seperate brokers under a parent account, and we want to report to this "Sold-To" level. Either way, it would warrant revisiting the data to understand, and likely an internal business decision to be made by stakeholders.

Business Questions

How much has FalseSouth invoiced to all brokers?

How much has FalseSouth collected?

What balance remains to be collected?

How much of total outstanding is overdue?

Calculated on 3/28/2022

Please create a chart by month showing invoiced amount split by paid vs. not paid.

What is the median invoice amount?

What is the mean/median/mode number of invoices per broker?

Broker Statement Script

General Note: When approaching any output deliverable, I try to put myself in the position of the person that is likely to consume it. The obvious use of a statement would be to have the ability to send it out to the customer. These could also be used internally to process and accelerate collections of accounts. Whether internal or external, it seemed to me that leaving no question in regards to the status of invoice items would be important. With that in mind, my thought was to clearly separate out the "Overdue Invoices" at the top so there is no ambiguity to the reader, and the remainder of "Paid Invoices" below that.